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Abstract 

From bird flocks to fish schools and ungulate herds to insect swarms, social biological aggregations are found across the 
natural world. An ongoing challenge in the mathematical modeling of aggregations is to strengthen the connection 
between models and biological data by quantifying the rules that individuals follow. We model aggregation of the pea 
aphid, Acyrthosiphon pisum. Specifically, we conduct experiments to track the motion of aphids walking in a featureless 
circular arena in order to deduce individual-level rules. We observe that each aphid transitions stochastically between a 
moving and a stationary state. Moving aphids follow a correlated random walk. The probabilities of motion state transitions, 
as well as the random walk parameters, depend strongly on distance to an aphid's nearest neighbor. For large nearest 
neighbor distances, when an aphid is essentially isolated, its motion is ballistic with aphids moving faster, turning less, and 
being less likely to stop. In contrast, for short nearest neighbor distances, aphids move more slowly, turn more, and are 
more likely to become stationary; this behavior constitutes an aggregation mechanism. From the experimental data, we 
estimate the state transition probabilities and correlated random walk parameters as a function of nearest neighbor 
distance. With the individual-level model established, we assess whether it reproduces the macroscopic patterns of 
movement at the group level. To do so, we consider three distributions, namely distance to nearest neighbor, angle to 
nearest neighbor, and percentage of population moving at any given time. For each of these three distributions, we 
compare our experimental data to the output of numerical simulations of our nearest neighbor model, and of a control 
model in which aphids do not interact socially. Our stochastic, social nearest neighbor model reproduces salient features of 
the experimental data that are not captured by the control. 
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Introduction 

From bird flocks to fish schools and ungulate herds to insect 
swarms, nature abounds with examples of animal aggregations [1— 
3]. These groups may arise from environmental factors, social 
factors, or a combination of the two. Environmental factors induce 
organisms to move in relation to food sources, light sources, 
gravity, predators, wind, chemical gradients, and more. On the 
other hand, even in the absence of significant environmental cues, 
some animals aggregate because of their intrinsic social tendencies. 
Social forces such as attraction, repulsion and alignment occur 
when these organisms interact, sensing each other via sight, smell, 
hearing, and so forth [4-8]. Social aggregations not only are 
examples of natural pattern formation, but on long time and space 
scales may influence disease transmission, food supply availability, 
ecological dynamics, and ultimately, evolution [9,10]. Additional- 
ly, the understanding of aggregations has been used to design 
algorithms in robotics, computer science, and engineering [11,12]. 

A central question in the study of aggregations pertains to the 
relationship between individual-level and group-level behaviors, 



and it is crucial to distinguish between these. Individual-level 
behaviors might include an organism's tendency to move closer to 
conspecifics, or to align its movement with that of its neighbors. 
Group-level properties describe characteristics of many individu- 
als, such as the shape of an aggregation, its spatial density 
distribution, and its velocity distribution. The connection between 
individual and group-level behaviors is highly nontrivial, as is 
typical for a complex system [13]. One methodology for exploring 
this connection is through mathematical modeling. By construct- 
ing mathematical models that describe each individual organism's 
rules for movement, one can simulate and analyze the ensemble to 
investigate the aggregate behavior. Indeed, aggregation modeling 
is the subject of an intensive effort in the mathematical modeling 
community, explored in [5,14—23] and many dozens of other 
studies. There exists a menagerie of mathematical models for 
aggregation. One criteria that distinguishes models is the degree to 
which randomness plays a role. Models can be completely 
deterministic, deterministic but with an added noise component, 
or completely stochastic. Models for random movement of 
biological organisms (such as the one we will presently develop) 
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often take the form of random walks [24,25] or Levy flights 
[26,27]. 

An ongoing challenge in aggregation modeling is to construct 
individual-level rules that are quantitatively accurate and well-tied 
to experimental data. Sometimes, modelers may attempt to 
calibrate models and infer parameters based on published field 
observations or experimental results, for example, as with recent 
studies of locust swarms [28,29]. A more direct approach is to 
conduct experiments that track the motion of individuals and use 
the data, namely time series of organisms' positions and velocities, 
to construct models more directly. This approach has enhanced 
the understanding of fish schools [30], starling flocks [31], and 
duck formations [32]. Presently, we consider social aggregation of 
the pea aphid, Acyrthosiphon pisum. These particular aphids are 
significant both because they are severe crop pests [33] and 
because they are a model organism in biology for studying disease 
transmission, insect-plant interactions, phenotypic plasticity, and 
more [34]. 

Some foundational results on pea aphid movement appear in 
[35]. Aphids moving on the ground exhibit two dispersal 
behaviors: searching and running. In the searching behavior, 
aphids look for a nearby plant to inhabit. Running aphids, in 
contrast, travel far away from their original host plant, likely in an 
effort to evade predators. In [35], aphids were exposed to predators 
while feeding on alfalfa plants. As a defense mechanism, aphids 
dropped from their feeding site and then traveled away from the 
original host plant. The average searching aphid made one turn 
every 6.67 s and traveled 0.27 cm/s while the average running 
aphid turned less frequently, every 27.8 s and traveled faster, at 
0.67 cm/s. In a given experimental run, aphids generally did not 
shift between the searching and running behaviors. 

In the absence of predators, some aphids move infrequendy 
[35]. When aphids are attacked by predators, the aphids employ 
defense mechanisms such as dropping from their location, 
running, or emitting a fluid droplet from the cornicle, a tube on 
the dorsal side of the last segment of the insect. The fluid droplet is 
composed of a mechanical protectant which temporarily paralyzes 
the jaws of the attacker [36] and alerts nearby conspecifics and 
heterospecifics to the danger [37]. The experiments in [38] 
investigate the emission of this fluid droplet further by prodding 
aphids of various ages on the anterior portion of their thorax and 
recording the aphid as an emitter or non-emitter. Pre-reproductive 
aphids are the most likely age group to emit this fluid droplet, 
plausibly because they often live in close proximity to highly 
related kin. Once the aphids reach adulthood, it is more 
advantageous to invest energy in reproduction. 

Despite the aforementioned account of chemical signaling, and 
while it is well-known that aphids aggregate around food sources 
[39], much less is known about whether certain aphid species form 
aggregations that are intrinsically social. Aphid species Uroleucon 
nigrotuberculatum and Uroleucon caligatum experience lower mortality 
from generalist predators when aggregated [40], suggesting an 
evolutionary advantage for social aggregation. Other results on 
aphid aggregation appear in [40-43,43]. In [43], pea aphids were 
placed in a chamber with five identical feeding stations. If the 
insects did not aggregate socially, one would expect an even 
distribution of aphids in each chamber, but this distribution was 
not observed. In both light and dark conditions, the aphids 
aggregated mainly in one or two of the feeding stations. Aphids in 
a dark environment still aggregated at statistically significant levels, 
albeit less strongly than in lit conditions, suggesting that vision may 
be one of the senses through which aggregation is activated. In 
contrast, in a key test in [43], artificial aphids were placed behind 
the feeding stations such that their shadows were clearly visible. 



The aphids in the chamber were then allowed to choose one of the 
five feeding stations at random. In this test, the chamber with the 
aphid dummies did not show greater likelihood of being chosen, 
which implies that vision is not the only mechanism enabling social 
aggregation. 

The experiments of [41,42] found that lime aphids, Eucallipterus 
tiliae, aggregate socially. Three studies in [42] are especially 
relevant. In the first study, an aphid was allowed to move and 
settle on a particular, uninhabited leaf. Its final position was 
marked and the aphid was removed. Trials were repeated on the 
same leaf with different individuals whose final positions were 
similarly marked. The distribution of settling locations was 
random, suggesting that microhabitats on a leaf do not influence 
aphids' movement. However, when multiple individuals were 
allowed to settle simultaneously on the leaf, they aggregated, 
suggesting that social interactions influence their movement. In the 
second study, between one and eleven aphids were already settled 
on a leaf, and one target aphid was placed on the leaf. When the 
target aphid approached a settled aphid (with approach defined as 
walking within 1 cm) on 82% of the trials, the target aphid settled 
within 1 cm of the other settled aphid. The third study examined 
aphid distribution for different population densities. In this study, 
each aphid had an associated virtual territory, defined as a circle of 
fixed radius around the insect, identical for all individuals. In 
experimental trials, the group was allowed to approach an 
equilibrium configuration. Then, the percent leaf coverage was 
computed as the area of the union of the territories divided by the 
area of the leaf. As the number of aphids was increased, the 
percent leaf coverage rose with decreasing slope, indicating close 
packing of the insects, ostensibly due to social interactions. 

Given the evidence for social aggregation in some aphid species, 
our goal at present is to assess and model aggregation of the pea 
aphid. More specifically, in order to deduce individual-level rules, 
we conduct experiments to track the motion of aphids walking in a 
featureless circular arena. We observe that each aphid transitions 
stochastically between a moving and a stationary state. Moving 
aphids follow a correlated random walk. The probabilities of 
stopping and starting, as well as the random walk parameters, 
depend strongly on distance to an aphid's nearest neighbor. For 
large nearest neighbor distances, when an aphid is essentially 
isolated, its motion is ballistic. Aphids move faster, turn less, and 
are less likely to stop. In contrast, for short nearest neighbor 
distances, aphids move more slowly, turn more, and are more 
likely to become stationary; this behavior constitutes an aggrega- 
tion mechanism. From the experimental data, we estimate the 
state transition probabilities and correlated random walk param- 
eters as a function of nearest neighbor distance. With the 
individual-level model established, we assess whether it reproduces 
the macroscopic patterns of movement at the group level. To do 
so, we consider three distributions, namely distance to nearest 
neighbor, angle to nearest neighbor, and percentage of population 
moving at any given time. For each of these three distributions we 
compare our experimental data to the output of numerical 
simulations of our nearest neighbor model, and of a control model 
in which aphids do not interact socially. Our social nearest 
neighbor model reproduces salient features of the experimental 
data that are not captured by the control. 

Experimental Methods 

To host aphid colonies, we grew fava bean plants, Vicia faba 
(Johnny's Selected Seeds, Winslow, ME) with 6—7 seeds per pot in 
an approximately 20°C laboratory setting at 60%-70% relative 
humidity. We stored plants in 45 cm x 45 cmx45 em mesh enclosures 
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Figure 1. Visualizations of aphid movement in experiment. (A) Trajectories of 28 aphids during approximately 15 min of one experimental 
trial, as determined by motion tracking of video data. The green circle is the experimental arena with radius 20 cm. (B) Blow-up of a subset of a single 
aphid trajectory, shown in a 10 cm x 10 cm zoom. 
doi:1 0.1 371 /journal.pone.0083343.g001 



(BugDorm, Taichun, Taiwan). Plants received 12 hr of continuous 
light per day from a 120 W grow lamp suspended 5-7.5 cm above 
the enclosure, or 25-30 cm above the plants. We considered plants 
to be mature enough to host aphids approximately two weeks after 
planting, when they reached a height of 1 5 cm above the flower pot 
rim. We colonized each plant with one hundred pea aphids, A. pisum 
(Nasco, Fort Atkinson, WI). We periodically cleaned enclosures 
when dirt or dead aphids accumulated. By seven days after 
colonization, plant health would deteriorate due to aphid feeding. 
At this point, we transferred the colony to fresh plants in a fresh 
enclosure. Aphids were then given several days to acclimate before 
being used in experimental trials. 

We performed experiments on a vibration isolation table 
(IsoStation, Newport Corp., Irvine, CA) in a darkened lab in 
order to minimize effects of the ambient environment. The 
experimental arena consisted of a polypropylene circular ring, with 
a radius of 20 cm and height of 3/16 in, enclosed between two 1/8 
in thick glass plates. We underlit the arena with a 24mx24m LED 
light panel (AnythingDisplay, Nashua, NH) having a 6500°A pure 
white color temperature. In order to remove debris that might 
interfere with imaging, and to remove any biological material that 
might potentially be left from previous experimental runs, we 
cleaned the top and bottom glass plates with acetone, ethanol and 
compressed air before every trial. We lined the arena wall and 
ceiling with silicone oil to discourage aphids from occupying the 
arena's walls and ceiling. 

Aphids are dimorphic insects that may develop into winged or 
wingless forms, depending on a complicated interaction between 
genetics and environment [44]. Since we wished to track two- 
dimensional motion, and in order to minimize any behavioral 
variations due to age, we restricted our experimental trials to adult 
wingless aphids (as identified by sight). Adult pea aphids have a 
body length of approximately 2-4 mm [45] . To initiate a trial, we 
selected individuals from a colonized plant, typically selecting a 
mix of aphids who appeared to be stationary and moving. Three 
trials incorporated 8, 10 and 18 aphids; the remaining six trials 
incorporated 27-35 aphids moving in the arena. We filmed the 
experiment using a 1080 p high definition video camera (Sony 
Handycam HDR-SR12) placed 1.1m above the arena, with white 
balance calibrated to adjust for the effect of the light box as a 
background. After 45 minutes of filming we ceased recording and 
returned aphids to the colony. 

To prepare our data for motion tracking, we converted raw 
video footage in.mts format to.mp4 using Handbrake video 



processing software with sampling in grayscale at 5 fps. We used 
QuickTime Pro to export the video into an image sequence of.tiff 
files, downsampled to 256 grays and 2 fps to facilitate data 
processing. Using the ImageJ image processing package [46] we 
removed initial frames of each trial during which overhead lights 
were reflected, and cropped the rectangular video frames to a 
circular region corresponding to the experimental arena. We 
further processed images using M atlab's Image Processing 
Toolbox and the u-track 2.0 motion tracking package [47]. 
Specifically, we converted color images to black and white ones (to 
render the inside of the arena black) and denoised each frame. We 
ran u-track, which forms trajectories by linking identified aphid 
positions from frame to frame using a Kalman filter for motion 
propagation. The tracking process resulted in more trajectories 
than the number of aphids used in the trial due to the inherent 
difficulty of motion tracking. That is to say, a single aphid's track 
across the course of an experimental trial may be recognized as 
several, shorter trajectories by the tracking algorithm, but this does 
not affect our data analysis and modeling (more details appear in 
subsequent sections). Finally, we converted tracked aphid positions 
from pixel coordinates to real coordinates. Fig. 1 shows examples 
of tracked data. 

To prepare our raw data set for modeling (see next section) we 
enhanced it with several elementary, derived pieces of data, 
namely motion state (stationary or moving), step length (distance 
traveled in one frame), heading, turning angle, and distance to 
nearest neighbor. An aphid's step length in a current frame was 
calculated as magnitude of the difference between its current and 
previous positions. We considered an aphid to be moving in a 
given frame if its step length was sufficiently large. For small steps, 
corresponding to speeds less than 4x10 2 cm/s (about 1/10 body 
length per second), we assumed the aphid to be stationary, with 
the small amount of movement attributed to noise in the video 
itself and errors in the aphid identification and tracking 
algorithms. An aphid's heading (the direction it was traveling in 
a given frame) was calculated by taking the angle of the difference 
between the aphid's current and previous position vectors. Finally, 
we calculated turning angle in a given frame as the difference in 
the current and previous heading. Our final data set consists of 1.2 
million entries from the pooled data of nine experimental runs. 
Each entry contains an aphid's position, motion state, step length, 
heading, and turning angle. 
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Mathematical Modeling of Individual-Level 
Behaviors 

Based on the observation that aphids in the experimental trials 
transitioned between stationary and moving behavior, we propose 
a probabilistic two-state model to describe aphid movement and 
social interaction dynamics. Let Pms represent the probability that 
a moving aphid in a given frame transitions to a stationary state in 
the next frame. Similarly, let P$m represent the probability that a 
stationary aphid in a given frame transitions to a moving state. 
Perhaps the simplest model that accounts for social interactions 
allows these probabilities to depend solely on the distance to an 
aphid's nearest neighbor, d. The underlying biological assumptions 
leading to this model are that aphids sense isotropically (perhaps 
due to a combination of visual, auditory, and olfactory inputs), that 
they are affected by the minimum possible social information, and 
that they do not react to the speed and orientation of their 
neighbor. We will show that this minimal model reproduces 
certain salient features of the experimental data. 

Moving aphids appear (naively) to follow a correlated random 
walk [24]; see Fig. IB. In an (unbiased) correlated random walk, 
an individual walks in a straight line of a certain (random) step 
length I, turns from its previous heading at an angle 8 that is 
random but drawn from a mean-zero distribution, and then 
repeats. In our model, we will assume that the correlated random 
walk parameters depend solely on distance to nearest neighbor, 
similar to the transition probabilities discussed above. For step 
length, we choose the simplest model, meaning that there is no 
spread in the step length distribution. A moving aphid's step length 
£ depends deterministically on its distance to nearest neighbor d. 
For turning angle 8, the mean of the distribution is zero by the 
assumption of symmetry of the correlated random walk. There- 
fore, we model dependence on d in the spread p of the turning 
angle distribution. 

We will now quantify our four model parameters: probability of 
a moving aphid stopping (Pms), the probability of a stationary 
aphid starting to move (Psm), a moving aphid's step length traveled 
in one frame (£), and the spread of the turning angle distribution 
that a moving aphid obeys (p). Each of these will depend on 
distance to an aphid's nearest neighbor d through simple 
functional forms with three or four parameters, which we estimate 
from experimental data below. 

To estimate the transition probabilities Pms an d Psm, we note 
that our data set (see previous section) includes a motion state for 
each entry. We can classify every transition that occurs in the data 
set as stationary to stationary (SS), stationary to moving (SM), 
moving to moving (MM) or moving to stationary (MS). We divide 
the data set in two, with SS and SM in one subset and MM and MS 
in the other. For each subset, we generate bins of 800 data points 
where binning is performed according to d. Within each bin, we 
estimate the probability of a transition as the ratio of the number 
of occurrences of the transition to the total number of 
observations. For instance, within a given bin, we estimate P M s as 

_ M ^occurrences 

MS MS occurrences + MM occurrences ' 

We then form a scatterplot of the probability within each bin 
versus the midpoint of the bin, resulting in Fig. 2. 

The probability Pms, shown in Fig. 2A, appears to decrease 
monotonically with d and level off. We model this decrease with 
the functional form 



P MS (d) = P MS + {P°ms ~ PMs)e- d/dMS ■ (2) 

The probability Here, P° MS represents the probability that an 
aphid will become stationary when infinitesimally close to its 
nearest neighbor, whereas P^s is the probability of transitioning 
when isolated, that is, even in the absence of sensed neighbors. 
The length scale dj^s characterizes the transition between the two 
limiting regimes of d. The choice of a decaying exponential 
function not only agrees well with the data (as discussed presendy) 
but has biological motivation. If one assumes that the motion state 
transition occurs due to sensing, and that the sensory input an 
aphid receives has a constant probability of failure per distance 
displaced from its source, then one obtains an exponential model, 
a common choice for aggregation modeling [48]. Overall, the 
model Eq. (2) reflects aphids being more likely to settle near other 
individuals, in order to aggregate. 

To fit Eq. (2) to the experimental data, we first observe that P MS 
and P° MS appear linearly while d M s appears nonlinearly. We 
minimize the root-mean-square (RMS) error of the fit by scanning 
across values of dMs an d at each value, performing a least squares 
fit for the two linear parameters. We find P^ s «0.1280, 
P° MS & 0.5508, and cIms~ 0.0 134 m, resulting in a fit (shown as 
the blue curve) with a high coefficient of determination, R 2 = 0.92. 
To give a further sense of the efficacy of the fit, it is helpful to 
consider the standard error in each bin, which is given by 

Pms(\—Pms) /,n 

V N ' (3) 

where N is the number of aphids per bin and Pms = PudA is the 
probability of transition within the bin. Green squares (red dots) 
represent bins for which the corresponding model prediction is 
within (outside of) two standard errors of the estimated Pms- 

The probability Psm is shown in Fig. 2B. Unlike Pms which 
decreases monotonically, P S m has a minimum at short distances. 
We choose the functional form 

PsM(d)=P° SM t- d l d SM + P^ M ^^. (4) 



The first (exponential) term models collision avoidanceThe first 
(exponential) term is repulsive, consistent with the notion that 
aphids avoid settling too close to others. The second (rational) term 
is a "loneliness" term, capturing that aphids move more when they 
are in isolation.is attractive, modeling the tendency of solitary 
aphids to move in order to aggregate. Together, these two terms 
specify a particular distance at which an aphid is most likely to be 
stationary (namely the value of d that minimizes Psm, which for 
our parameters is approximately 0.014 m). We fit this functional 
form to the data through a procedure similar to Pms, except that 
we must now search over a grid of two nonlinear parameters, d$M 
and A SM . We find P° SM w 0.1587, Pf M « 0.3552, d SM « 0.0079 m, 
and Asm x 0.0739 m. To compare each data point to the model, 
we use the same green square/red circle scheme as above. The 
overall fit has P 2 = 0.53. This coefficient of determination, 
substantially lower than for P M s, is likely due to the large scatter 
of the data for large d, which may reflect two sources of error. 
First, imaging and tracking of aphids is more difficult when they 
are in the vicinity of the boundary of the arena, and aphids at large 
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Figure 2. State transition probabilities P MS and P MS as a function of distance to an aphid's nearest neighbor, d (in m). (A) P MS , the 
probability that an aphid moving in a given timestep becomes stationary at the next timestep. Each data point represents the probability within a bin 
of 800 elements from our experimental data set, where the data are binned by d. The probability is calculated via a simple frequency count according 
to Eq. (1). The overall dependence of the data on d is modeled with Eq. (2), which describes an increased probability of an aphid settling if a neighbor 
is nearby. Best fit parameters appear in the text; the coefficient of determination is R 2 = 0.92. To give a further sense of the efficacy of the fit, we 
display each point according to the standard error of the mean within the bin it represents. If the model curve passes within two standard errors of 
the estimated value, we show it as a green square; otherwise, it is a red dot. (B) Like (A), but for the probability P SM that a stationary aphid starts 
moving. The model is Eq. (4), describing higher aphid mobility at very short and very long d. Here, ft 2 = 0.52; see text for discussion. 
doi:1 0.1 371 /journal.pone.0083343.g002 



d are more likely to be near a boundary. Second, it is possible that 
there is an explicit effect of the boundary on aphids' behavior 
which we have not modeled here. 

We tried several functional forms (including linear combinations 
of exponentials) but choose Eq. (4), which minimizes the RMS 
error with two pairs of parameters. We believe the exact functional 
form is less important than the trends of higher mobility at both 
very short and very long distances. 

We now turn to the parameters governing moving aphids' 
correlated random walks. Fig. 3 shows the mean step length as a 
function of d, with each point in the scatterplot corresponding to a 
bin of 800 data points. Because there is a coherent rise in the data 
for small d, we consider the model 

l(d) = l co + (f°-£ co )e- d/d f. (5) 



According to this model, aphids with neighbors nearby take 
short steps, and the step length increases and saturates as d 
increases. Using a similar fitting procedure to Pms an d Psm, we 
find ^wQ.OOlSm, l°«0.0003m, and <4«0.0074w. Within 
each bin, the standard error around the mean is s/y/N where A" is 
the number of observations and s is the sample standard deviation. 
To compare experimental bins with the model prediction, we use 
the same green squares/red dot visualization as above. For the 
overall fit, we find R 2 = 0.82. The data decrease moderately from 
our model curve for d > 0. 1 m, which is half the radius of our 
experimental arena. Once again, we believe that we may be seeing 
biases due to the boundary and the increased difficulty of motion 
tracking near the boundary. 

Finally, we model the spread of the distribution of turning 
angles 9. We bin 9 values by d with 2400 values per bin (larger 
than the previously used value of 800 in order to help reduce the 
standard error within each bin). As alluded previously, within 
every data bin, the distribution is strongly peaked around zero; see 
the examples in Fig. 4B and Fig. 4C. Therefore, to capture the 
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0 0.05 0.1 0.15 0.2 

d(m) 



Figure 3. Correlated random walk step length I (in m) per frame as a function of distance to an aphid's nearest neighbor of (in m). 

Each data point represents the mean step length within a bin of 800 elements from our experimental data set, where the data are binned by d. The 
overall dependence of the data on d is modeled with Eq. (5), which captures the tendency of aphids to aggregate simply by traveling less when in the 
vicinity of others. Best fit parameters appear in the text; the coefficient of determination is R 2 = 0.82. To give a further sense of the efficacy of the fit, 
we display data points according to the same scheme used in Fig. 2. Green squares (red dots) represent data bins for which the model prediction falls 
within (outside) two standard errors of the experimental mean. 
doi:1 0.1 371 /journal.pone.0083343.g003 



effect of neighbors, it is necessary to model the spread of the 
distribution of 9, which indeed appears to depend on d. Since 9 is 
an angular distribution, it exists on the interval [ — n,Tt]. Wrapped 
normal distributions give a poor fit to our data (not shown). We 
instead select the wrapped Cauchy distribution [49] centered at 
zero, 



f{6)- 



1 



1 



27i 1 + p 2 — 2p cos 9 ' 



(6) 



where 0 < p < 1 is a parameter governing the spread of the 
distribution. Small values of p correspond to more spread 
distributions, whereas values closer to one result in strongly 
peaked distributions. Fig. 4A shows p as a function of d for the 
binned data. As a model, we select the functional form 



p(d) = p aj + (p°-p aj )e-'i / 'ii>. 



(7) 



According to this model, aphids with nearby neighbors will turn 
more often at wider angles, resulting in motion that is less ballistic 
and more diffusive. 

Fitting the model as described previously, we find p m «0.9013, 
p°«0.1387, and d p x 0.0044 m. To compare the experimental 
data and the model within a given bin, we calculate a 95% 
confidence interval by resampling the data in each bin thousands 
of times, calculating p each time, and considering the resulting 
distribution of values of p. If the value of p predicted by our model 
falls within the central 95 % of the sampling distribution, we show 
the data point in Fig. 4A as a green square; otherwise it is a red 
dot. For the fit of Eq. (7), we find R 2 = 0.99. 

In summary, our model consists of just four quantities: Pms, 
Psm, t, an d p. Each of these depends on d via three or four 



parameters. In total, we have fit 1 3 parameters, but we note that 
there are over one million entries in our data set. 

As alluded previously, one component ignored in the model is 
the arena's boundary. While it is quite likely that the presence of a 
boundary wall influences aphids' movement, the majority of our 
data set is composed of aphids far from the boundary. Fig. 5 shows 
the cumulative distribution function of distance to boundary for 
the entire data set. Only 10% of our data is within 2 cm of the 
boundary (4 or 5 aphid body lengths), and we leave the 
quantification of boundary effects as future work. 

With our model for individual-level behavior established, we 
will presendy assess the degree to which it reproduces group-level 
behaviors. For comparison and contrast, we also consider a control 
model in which aphids do not interact at all. For this non- 
interaction model, we use the asymptotic (limit of large d) values of 
the parameters in our individual-level model. That is, we set 



Pms = Pmsi Psm~- 



P$ M ,t=r,*.nA P = p a 



Simulation and Analysis of Group-Level Behaviors 

We now shift our focus to group-level behaviors. We compare 
the experimental data [EXP] with data simulated from the two 
models developed above, namely the one in which aphids interact 
with their nearest neighbor (model INT) and the one in which 
aphids do not interact (model jVOjV). For each model, we carry out 
simulations parallel to each experimental run, that is, having the 
same initial aphid positions and containing the same number of 
frames. We augment the individual-level behaviors with a rule for 
what simulated aphids do if they encounter the (simulated) arena 
boundary. If an aphid travels to a new position that would be 
outside of the arena, we apply a simplistic reflective boundary 
condition in which the angle of incidence on the boundary equals 
the angle of reflection. Also, we let the distance the aphid travels 
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Figure 4. Correlated random walk turning angle 0. (A) Turning angle distribution parameter 0<p<1 as a function of distance to an aphid's 
nearest neighbor. Here, p is a parameter in the zero-mean wrapped Cauchy distribution Eq. (6) used to model turning angle 0. Each data point 
represents the experimentally measured value of p within a bin of 2400 elements from our experimental data set, where the data are binned by d. 
The overall dependence of the data on d is modeled with Eq. (7), which captures the tendency of aphids to aggregate by taking wider turns when in 
the vicinity of others, leading to motion that is more diffusive and less ballistic. Best fit parameters appear in the text; the coefficient of determination 
is R 2 = 0.99. Green circles (red dots) points represent data bins for which the model prediction falls within (outside) a 95% confidence interval around 
the experimentally measured p, where the interval is constructed by resampling our original data 20,000 times. (B) Normalized histogram showing 
the experimental turning angle distribution within the data bin corresponding to the magenta triangle in (A). The blue curve shows the wrapped 
Cauchy distribution predicted by our model. (C) Like (B), but for the magenta diamond. 
doi:10.1371/journal.pone.0083343.g004 



once it reflects off the wall be the distance it would have travelled 
beyond the boundary. 

We will compare three different group-level behaviors by 
studying their corresponding cumulative distribution functions as 
computed across each data set. A cumulative distribution tells, for 
any particular value of a data variable (horizontal axis) the 
percentage of data in the data set that is less than or equal to that 
value (vertical axis). It will be convenient to call our cumulative 
distributions F? , Fj NT , Ff 0N , where the subscript i indexes the 
distribution (since it is discrete). Our strategy will be to make three 
pairwise comparisons for each group-level behavior, namely Ff xp 
vs. F[ NT , Ff XF vs. Ff m , and F\ NT vs. Ff 0N . It is also convenient 
to define the underlying probability distributions, f } p , fJ NT , 
j-non p Qr eac j 1 pairwise comparison we will calculate several 
different quantities. A simple comparison is the distance between 
median values of the probability distributions, which we refer to as 
A. Another choice is the Kolmogorov-Smirnov distance Dks 
[50,51], a common nonparametric measure. For two cumulative 



distributions Fj and G,, Dks = maXfl-F/— G,|the maximum 
vertical distance between two cumulative distributions. Finally, 
we consider the Kullback-Leibler divergence D^l [52]. This 
quantity measures the information lost when a distribution ff is 
used to approximate another distribution, f{ . It is defined as 

i V; / 

where for us, the superscript 1 and 2 will refer to one of our three 
data sets. Results appear in Table 1, Table 2, Table 3. We do not 
perform statistical hypothesis testing using A, D^s, and D^l 
because we have no null hypothesis that our models and 
experiment produce statistically indistinguishable data. Rather, 
we expect that they are different, and we simply use empirical 
measures to assess the closeness of the model distributions to the 
experimental one. 
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Figure 5. Cumulative distribution of aphids as a function of distance to arena boundary (in m) for experimental data set. The circular 
experimental arena has a radius of 0.2 m. Only 10% of the data set corresponds to aphids within 2 cm (about five body lengths) of the boundary. 
doi:1 0.1 371 /journal.pone.0083343.g005 



The first group-level behavior we consider is the distribution of 
nearest neighbor distances d that emerges through an experiment 
or simulation. The cumulative distributions are shown in Fig. 6(A), 
with EXP as solid blue, INT as dashed green, and NON as dot- 
dashed red. Statistical measures are given in Table 1 . We see that 
A is smaller for EXP vs. INT than for EXP vs. NON by 
approximately a factor of two. Put differently, the shorter median 
(/for INT (as opposed to NON) indicates that the social behaviors in 
the model indeed promote aggregation. The experimental curve 
has an even shorter d. Model INT appears to capture some (but 
not all) of the aggregative tendency seen in the experiment. The 
Kolmogorov-Smirnov distance, D/^s, is smaller between EXP and 
ZATthan EXP and NON, as is Dkl- Thus, by all three measures, 
INT captures more of the experimental behavior than NON does. 

The second group-level behavior we consider is the distribution 
of angle to nearest neighbor, <j>, measured relative to an aphid's 
heading 0. The cumulative distributions and statistical informa- 

Table 1. Measures comparing cumulative distributions of 
distance to nearest neighbor d in experiment (EXP), a social 



interaction model (INT) and 
(NON). 


a noninteracting control model 




Comparison 


Ax 


D KS 


D KL 


EXP vs. INT 


0.0046 


0.1083 


0.0835 


EXP vs. NON 


0.0159 


0.3226 


0.3873 


INT vs. WOW 


0.0113 


0.2181 


0.1668 



By measures of the difference between median values A (in m), the 
Kolmogorov-Smirnov distance D KS , and the Kullback-Leibler divergence D KLl 
the cumulative distribution of INT comes closer to EXP than WOW does. Since A 
is a dimensioned quantity, it is meaningful to compare values to an aphid body 
length, approximately 0.004 m. EXP and INT have median values that differ by a 
body length, while the other two comparison have median differences an order 
of magnitude larger. The three distributions are shown in Fig. 6(A). 
doi:1 0.1 371 /joumal.pone.0083343.t001 



tion appear in Fig. 6(B) and Table 2. The graph reveals that EXP, 
INT, and NON all give rise to a uniform distribution of relative 
orientation (reflected by the linear cumulative profile). Therefore, 
aphids in experiment and in both models do not preferentially 
align towards their nearest neighbors. 

Finally, we consider the third group-level behavior, the 
distribution of the fraction M% of aphids moving at a given time. 
The cumulative distributions and statistical information appear in 
Fig. 6(C) and Table 3. They are strikingly different. As with the 
distributions for d, INT reproduces much more of the behavior of 
EXP than NON does. The extreme rightward shift of the red curve 
indicates that the mobility of aphids is much higher in NON, put 
differently, aphids in this model do not aggregate and settle nearly 
as much as in EXP and INT. 

Conclusion 

Through experiment and modeling, we have investigated the 
movement, social behavior, and aggregation of the pea aphid. 
Motion-tracked experimental data gives rise to a two-state model 
in which aphids transition stochastically between stationary and 
moving states. Moving aphids follow a correlated random walk. 
The state transition probabilities Pms and Psm, the random walk 
step length £, and the random walk turning angle distribution 
spread p all depend on distance to an aphid's nearest neighbor, d. 
These four quantities have each been fit with a functional form 
incorporating three or four parameters whose values we estimated 
from the experimental data. To assess the efficacy of our model in 
reproducing group-level behaviors, we compared experimental 
data to outputs of our social nearest neighbor model and a control 
(noninteracting) model. We found that the social model repro- 
duces the distribution of nearest neighbors and the distribution of 
fraction of moving aphids better than the control model. The 
experiment and both models display no difference for a third 
group-level property, namely angle to nearest neighbor. 

Our mathematical model is strikingly different from some 
previous data-driven aggregation models. The model of golden 
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Figure 6. Pea aphid group-level behaviors in experiment, a social interaction model, and a control (non-interacting) model. (A) 

Cumulative distributions F, of distance to nearest neighbor d (in m) for experimental data set (solid blue), social interaction model (dashed green), 
and non-interacting model (dot-dashed red). (B) Like (A), but the cumulated quantity is angle to nearest neighbor tj> (relative to an aphid's heading 0). 
(C) Like (A), but the cumulated quantity is M%, fraction of the aphid population moving in a given frame. As compared to the curves in (A) and (B), 
the more staircase-like appearance of these curves arises simply from the fact that the variable being cumulated is discrete (percentage of aphids in a 
group of several dozen) as opposed to the continuous variables in (A) and (B). For (A)-(C), measures of the difference between the distributions are 
given in Tables 1-3 respectively. 
doi:1 0.1 371 /journal.pone.0083343.g006 



Table 2. Measures comparing cumulative distributions of 
angle to nearest neighbor ^ in experiment {EXP), a social 



interaction model {INT) and 
{NON). 


a noninteracting control model 




Comparison 


Ax 


D KS 


D KL 


EXP vs. INT 


0.0431 


0.0085 


0.0152 


EXP vs. NON 


0.0352 


0.0128 


0.0135 


INT vs. NOW 


0.0078 


0.0057 


0.0035 



By measures of the difference between median values A, the Kolmogorov- 
Smirnov distance Dks. and the Kullback-Leibler divergence Dkl, the cumulative 
distributions for INT, NON, and EXP are nearly identical. Since <j) is an angle 
measured in radians, the values of A should be compared to the value 2n. The 
three distributions are shown in Fig. 6(B). 
doi:1 0.1 371 /journal.pone.0083343.t002 



Table 3. Measures comparing cumulative distributions of 
fraction of aphids moving M% in experiment (EXP), a social 
interaction model {/NT) and a noninteracting control model 
(NON). 







Comparison 


Ax 


D KS 


D KL 


EXP vs. INT 


0.0774 


0.3226 


0.7789 


EXP vs. NON 


0.471 1 


0.8915 


0.8373 


INT vs. NON 


0.3938 


0.8806 


1 .6649 



By measures of the difference between median values A, the Kolmogorov- 
Smirnov distance D KS , and the Kullback-Leibler divergence D KL , the cumulative 
distribution of INT comes closer to EXP than NON does. This is especially 
apparent in the A and Dxs values. The three distributions are shown in Fig. 6(C). 
doi:1 0.1 371 /journal.pone.0083343.t003 
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shiner fish in [30] and the model of surf scoter ducks in [32] are 
primarily deterministic, describing organisms that simultaneously 
attract, repel, and align. In these studies, noise additively 
modulates an organism's intended direction at each time step, 
presumably to describe errors in sensing and movement capabil- 
ities. In contrast, our model has rules that are fundamentally 
random. Fig. 2 shows that aphids under similar conditions (same 
distance to nearest neighbor) display different behaviors (transi- 
tioning vs. not transitioning motion state). Fig. 3 and Fig. 4 suggest 
that the movement process for aphids is a random walk. 

The biological conclusions of our work are as follows. First, we 
have provided strong quantitative evidence that pea aphids display 
social behavior, in that an individual's movement in a featureless 
environment is influenced by its nearest neighbor. 

Second, we have gained insight into the mechanism by which 
aphids aggregate. The probability of a stationary aphid starting to 
move decreases if a neighbor is nearby. The probability of a 
moving aphid stopping increases if a neighbor is nearby. These 
two behaviors promote aggregation. Further, aphids that are 
moving take shorter steps and turn more when in the vicinity of 
neighbors, promoting motion that is more diffusive and less 
ballistic (that is, less likely to move it away from the neighbor). This 
is reminiscent of the classic run-and- tumble model of bacteria [53]. 
In short, aggregation occurs through movement decreasing in the 
proximity of other aphids as opposed to direct locomotion towards 
individuals or clusters. 

ThirdFinally, our model of individual-level behavior gives some 
feeling for the sensing range of the aphid. We recall the 
exponential length scales rf J vfs«0.0134 m, d$M * 0.0079 m, 
d(K 0.0074 m, and d p » 0.0044 m. These characteristic length 
scales are on the order of 1-3 aphid body lengths. 

As evidenced by the metrics in the previous section, our 
individual-based social model reproduces group-level featuresbe- 
haviors muchbetter than a control model. There remain many 
avenues for further investigation. While we have demonstrated 
that pea aphid behavior promotes aggregation, we have not 
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